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Figure 1: Shot from the Crytek Sponza scene with semi-transparent bubbles added, lit by 1024 random lights. With 8x MSAA at 720p 
resolution, Tiled Deferred runs at 53 FPS (without bubbles), Tiled Forward at 52 FPS and Clustered Forward at 161 FPS on a GTX 680. The 
diagrams illustrate how transparent geometry affects clustered and tiled forward shading. 


1 Abstract 

We present details of Tiled and Clustered Forward Shading in 
its application to rendering transparent geometry and using Multi 
Sampling Anti Aliasing (MSAA). We detail how transparency and 
MSAA is supported, and present performance results measured on 
modern GPUs. 

Previous techniques for handling large numbers of lights are usu¬ 
ally based on deferred shading [Andersson 2009; Lauritzen 2010]. 
However, deferred shading techniques struggle with impractically 
large frame buffers when MSAA is used, and make supporting 
transparency difficult. In addition, deferred shading makes it more 
difficult to support custom shaders on geometry. 

Tiled Forward Shading is a new and highly practical approach to 
real-time shading scenes with thousands of light sources, intro¬ 
duced by Olsson and Assarsson in 2011 [2011]. Their results, mea¬ 
sured on an GTX 280 GPU, indicated that tiled forward shading 
was impractically slow. Performance on more recent GPUs has 
improved considerably (approaching that of tiled deferred), which 
opens up the possibility of using the technique to support trans¬ 
parency and MSAA. 

Clustered Shading further extends tiled shading by adding depth 
partitioning [Olsson et al. 2012]. We show how Clustered Forward 
Shading can be extended to support transparency efficiently. 

Forward shading naturally supports both transparency and MSAA, 
which has been shown in previous work. However, the performance 
and implementation details have not previously been investigated. 


around actual samples that need shading, efficiency is much better 
(Figure 1, left). 

For deferred shading a single 1080p, 16x MSAA, 16-bit float 
RGBA buffer requires over 250Mb of memory. In addition, each 
sample may need to be shaded individually, effectively running 
shading at a per-sample frequency. For forward shading, no G- 
Buffers are required and MSAA is trivially enabled. 

A brief performance and memory comparison is shown in Figure 2, 
showing that clustered forward outperforms tiled forward by more 
than 2 times, and also outperforms tiled deferred, if MSAA is used. 



Figure 2: Left, performance for a view similar to Figure 1 (deferred 
without bubbles). Right, memory use of deferred vs. forward at 
720p, assuming 32-bit depth and color targets, and 3 x 64-bit G- 
buffers. 
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